The number of scientific publications continues to rise exponentially, especially in Computer Science (CS). However, current solutions to analyze those publications restrict access behind a paywall, offer no features for visual analysis, limit access to their data, only focus on niches or sub-fields, and/or are not flexible and modular enough to be transferred to other datasets. In this thesis, we conduct a scientometric analysis to uncover the implicit patterns hidden in CS metadata and to determine the state of CS research. Specifically, we investigate trends of the quantity, impact, and topics for authors, venues, document types (conferences vs. journals), and fields of study (compared to, e.g., medicine). To achieve this we introduce the CS-Insights system, an interactive web application to analyze CS publications with various dashboards, filters, and visualizations. The data underlying this system is the DBLP Discovery Dataset (D3), which contains metadata from 5 million CS publications. Both D3 and CS-Insights are open-access, and CS-Insights can be easily adapted to other datasets in the future. The most interesting findings of our scientometric analysis include that i) there has been a stark increase in publications, authors, and venues in the last two decades, ii) many authors only recently joined the field, iii) the most cited authors and venues focus on computer vision and pattern recognition, while the most productive prefer engineering-related topics, iv) the preference of researchers to publish in conferences over journals dwindles, v) on average, journal articles receive twice as many citations compared to conference papers, but the contrast is much smaller for the most cited conferences and journals, and vi) journals also get more citations in all other investigated fields of study, while only CS and engineering publish more in conferences than journals.
translated by 谷歌翻译
Charisma is considered as one's ability to attract and potentially also influence others. Clearly, there can be considerable interest from an artificial intelligence's (AI) perspective to provide it with such skill. Beyond, a plethora of use cases opens up for computational measurement of human charisma, such as for tutoring humans in the acquisition of charisma, mediating human-to-human conversation, or identifying charismatic individuals in big social data. A number of models exist that base charisma on various dimensions, often following the idea that charisma is given if someone could and would help others. Examples include influence (could help) and affability (would help) in scientific studies or power (could help), presence, and warmth (both would help) as a popular concept. Modelling high levels in these dimensions for humanoid robots or virtual agents, seems accomplishable. Beyond, also automatic measurement appears quite feasible with the recent advances in the related fields of Affective Computing and Social Signal Processing. Here, we, thereforem present a blueprint for building machines that can appear charismatic, but also analyse the charisma of others. To this end, we first provide the psychological perspective including different models of charisma and behavioural cues of it. We then switch to conversational charisma in spoken language as an exemplary modality that is essential for human-human and human-computer conversations. The computational perspective then deals with the recognition and generation of charismatic behaviour by AI. This includes an overview of the state of play in the field and the aforementioned blueprint. We then name exemplary use cases of computational charismatic skills before switching to ethical aspects and concluding this overview and perspective on building charisma-enabled AI.
translated by 谷歌翻译
Research connecting text and images has recently seen several breakthroughs, with models like CLIP, DALL-E 2, and Stable Diffusion. However, the connection between text and other visual modalities, such as lidar data, has received less attention, prohibited by the lack of text-lidar datasets. In this work, we propose LidarCLIP, a mapping from automotive point clouds to a pre-existing CLIP embedding space. Using image-lidar pairs, we supervise a point cloud encoder with the image CLIP embeddings, effectively relating text and lidar data with the image domain as an intermediary. We show the effectiveness of LidarCLIP by demonstrating that lidar-based retrieval is generally on par with image-based retrieval, but with complementary strengths and weaknesses. By combining image and lidar features, we improve upon both single-modality methods and enable a targeted search for challenging detection scenarios under adverse sensor conditions. We also use LidarCLIP as a tool to investigate fundamental lidar capabilities through natural language. Finally, we leverage our compatibility with CLIP to explore a range of applications, such as point cloud captioning and lidar-to-image generation, without any additional training. We hope LidarCLIP can inspire future work to dive deeper into connections between text and point cloud understanding. Code and trained models available at https://github.com/atonderski/lidarclip.
translated by 谷歌翻译
Stable Diffusion is a recent open-source image generation model comparable to proprietary models such as DALLE, Imagen, or Parti. Stable Diffusion comes with a safety filter that aims to prevent generating explicit images. Unfortunately, the filter is obfuscated and poorly documented. This makes it hard for users to prevent misuse in their applications, and to understand the filter's limitations and improve it. We first show that it is easy to generate disturbing content that bypasses the safety filter. We then reverse-engineer the filter and find that while it aims to prevent sexual content, it ignores violence, gore, and other similarly disturbing content. Based on our analysis, we argue safety measures in future model releases should strive to be fully open and properly documented to stimulate security contributions from the community.
translated by 谷歌翻译
在本文中,我们证明了基于深度学习的方法可用于融合多对象密度。给定一个带有几个传感器可能不同视野的传感器的方案,跟踪器在每个传感器中在本地执行跟踪,该跟踪器会产生随机有限的集合多对象密度。为了融合来自不同跟踪器的输出,我们调整了最近提出的基于变压器的多对象跟踪器,其中融合结果是一个全局的多对象密度,描述了当前时间的所有活物体。我们将基于变压器的融合方法与基于模型的贝叶斯融合方法的性能进行比较,在几种模拟方案中,使用合成数据进行了不同的参数设置。仿真结果表明,基于变压器的融合方法在我们的实验场景中优于基于模型的贝叶斯方法。
translated by 谷歌翻译
光学相干断层扫描(OCT)是微尺度的体积成像方式,已成为眼科临床标准。 OCT仪器图像通过栅格扫描整个视网膜上的聚焦光点,从而获取顺序的横截面图像以生成体积数据。收购期间的患者眼动作带来了独特的挑战:可能会发生非刚性,不连续的扭曲,从而导致数据和扭曲的地形测量差距。我们提出了一种新的失真模型和相应的全自动,无参考优化策略,用于在正交栅格扫描,视网膜OCT量中进行计算运动校正。使用新型的,域特异性的时空参数化,可以首次连续校正眼睛运动。时间正则化的参数估计提高了先前空间方法的鲁棒性和准确性。我们在单个映射中在3D中单独校正每个A-SCAN,包括OCT血管造影协议中使用的重复采集。专业的3D前向图像扭曲将中位运行时间降低到<9 s,足够快地供临床使用。我们对18名具有眼病理学的受试者进行了定量评估,并在微扫描过程中证明了准确的校正。横向校正仅受眼震颤的限制,而亚微米可重复性是轴向可重复性的(中位数为0.51 UM中位数),这比以前的工作有了显着改善。这允许评估局灶性视网膜病理学的纵向变化,作为疾病进展或治疗反应的标志,并承诺能够使多种新功能(例如Suppersmplempled/Super-Supersmpled/Super-Super-Super-Super-Spemply/Super-Supertolution Reponstruction and Ransition and Anallys in Dealitaligy Eye the Neurologation疾病中发生的病理眼运动分析。
translated by 谷歌翻译
实现自动化车辆和外部服务器,智能基础设施和其他道路使用者之间的安全可靠的高带宽低度连通性是使全自动驾驶成为可能的核心步骤。允许这种连接性的数据接口的可用性有可能区分人造代理在连接,合作和自动化的移动性系统中的功能与不具有此类接口的人类操作员的能力。连接的代理可以例如共享数据以构建集体环境模型,计划集体行为,并从集中组合的共享数据集体学习。本文提出了多种解决方案,允许连接的实体交换数据。特别是,我们提出了一个新的通用通信界面,该界面使用消息排队遥测传输(MQTT)协议连接运行机器人操作系统(ROS)的代理。我们的工作整合了以各种关键绩效指标的形式评估连接质量的方法。我们比较了各种方法,这些方法提供了5G网络中Edge-Cloud LiDAR对象检测的示例性用例所需的连接性。我们表明,基于车辆的传感器测量值的可用性与从边缘云中接收到相应的对象列表之间的平均延迟低于87毫秒。所有实施的解决方案均可为开源并免费使用。源代码可在https://github.com/ika-rwth-aachen/ros-v2x-benchmarking-suite上获得。
translated by 谷歌翻译
受限的玻尔兹曼机器(RBMS)提供了一种用于无监督的机器学习的多功能体系结构,原则上可以以任意准确性近似任何目标概率分布。但是,RBM模型通常由于其计算复杂性而无法直接访问,并调用了Markov-Chain采样以分析学习概率分布。因此,对于培训和最终应用,希望拥有既准确又有效的采样器。我们强调,这两个目标通常相互竞争,无法同时实现。更具体地说,我们确定并定量地表征了RBM学习的三个制度:独立学习,精度提高而不会失去效率;相关学习,较高的精度需要较低的效率;和退化,精度和效率都不再改善甚至恶化。这些发现基于数值实验和启发式论点。
translated by 谷歌翻译
在整个2019年冠状病毒疾病(COVID-19)大流行中,决策者依靠预测模型来确定和实施非药物干预措施(NPI)。在构建预测模型时,需要从包括开发人员,分析师和测试人员在内的各种利益相关者进行不断更新的数据集,以提供精确的预测。在这里,我们报告了可扩展管道的设计,该管道可作为数据同步,以支持国际自上而下的时空时空观察和covid-19的预测模型,名为Where2test,用于德国,捷克西亚和波兰。我们已经使用PostgreSQL构建了一个操作数据存储(ODS),以连续合并多个数据源的数据集,执行协作工作,促进高性能数据分析和跟踪更改。 ODS不仅是为了存储来自德国,捷克和波兰的COVID-19数据,而且还存储了其他领域。元数据的模式采用维数事实模型,能够同步这些区域的各种数据结构,并且可以扩展到整个世界。接下来,使用批处理,转移和负载(ETL)作业填充ODS。随后创建了SQL查询,以减少为用户预处理数据的需求。然后,数据不仅可以支持使用版本控制的Arima-Holt模型和其他分析来预测,以支持决策制定,还可以风险计算器和优化应用程序。数据同步以每天的间隔运行,该间隔显示在https://www.where2test.de上。
translated by 谷歌翻译
高参数优化(HPO)是用于实现峰值预测性能的机器学习模型的关键组成部分。尽管在过去几年中提出了许多HPO的方法和算法,但在照明和检查这些黑盒优化问题的实际结构方面几乎没有取得进展。探索性景观分析(ELA)集成了一组技术,可用于获得有关未知优化问题的特性的知识。在本文中,我们评估了30个HPO问题的五个不同的黑盒优化器的性能,其中包括在10个不同数据集中训练的XGBoost学习者的两维连续搜索空间。这与对黑框优化基准(BBOB)对360个问题实例进行评估的相同优化器的性能形成鲜明对比。然后,我们计算HPO和BBOB问题上的ELA特征,并检查相似性和差异。 ELA特征空间中HPO和BBOB问题的聚类分析使我们能够确定HPO问题与结构元级别上的BBOB问题相比。我们确定了与ELA特征空间中HPO问题接近的BBOB问题的子集,并表明优化器性能在这两组基准问题上相似。我们重点介绍了ELA对HPO的公开挑战,并讨论了未来研究和应用的潜在方向。
translated by 谷歌翻译